NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Dynamic Tracking, MLOps, and Workflow Integration: Enabling Transparent Reproducibility in Machine Learning

https://doi.org/10.1109/e-Science62913.2024.10678658

Safri, Hamza; Papadimitriou, George; Deelman, Ewa (September 2024, IEEE)

Full Text Available
GPU Reliability Assessment: Insights Across the Abstraction Layers

https://doi.org/10.1109/CLUSTER59578.2024.00008

Yang, Lishan; Papadimitriou, George; Sartzetakis, Dimitris; Jog, Adwait; Smirni, Evgenia; Gizopoulos, Dimitris (September 2024, IEEE)

Graphics Processing Units (GPUs) are widely de-ployed and utilized across various computing domains including cloud and high-performance computing. Considering its extensive usage and increasing popularity, ensuring GPU reliability is cru-cial. Software-based reliability evaluation methodologies, though fast, often neglect the complex hardware details of modern GPU designs. This oversight could lead to misleading measurements and misguided decisions regarding protection strategies. This paper breaks new ground by conducting an in-depth examination of well-established vulnerability assessment methods for modern GPU architectures, from the microarchitecture all the way to the software layers. It highlights divergences between popular software-based vulnerability evaluation methods and the ground truth cross-layer evaluation, which persist even under strong protections like triple modular redundancy. Accurate evaluation requires considering fault distribution from hardware to software. Our comprehensive measurements offer valuable insights into the accurate assessment of GPU reliability.
more » « less
Full Text Available
A Workflow Management System Approach To Federated Learning: Application to Industry 4.0

https://doi.org/10.1109/DCOSS-IoT61029.2024.00047

Safri, Hamza; Papadimitriou, George; Desprez, Frédéric; Deelman, Ewa (April 2024, IEEE)

Full Text Available
FlyPaw: Optimized Route Planning for Scientific UAVMissions

https://doi.org/10.1109/e-Science58273.2023.10254831

Grote, Andrew; Lyons, Eric; Thareja, Komal; Papadimitriou, George; Deelman, Ewa; Mandal, Anirban; Calyam, Prasad; Zink, Michael (October 2023, IEEE)

Full Text Available
Silent Data Errors: Sources, Detection, and Modeling

https://doi.org/10.1109/VTS56346.2023.10139970

Singh, Adit; Chakravarty, Sreejit; Papadimitriou, George; Gizopoulos, Dimitris (April 2023, 2023 IEEE VLSI Test Symposium (VTS))

Full Text Available
Experiments on Network Services for Video Transmission using FABRIC Instrument Resources

https://doi.org/10.1109/INFOCOMWKSHPS57453.2023.10225817

Morel, Alicia Esquivel; Gafurov, Durbek; Calyam, Prasad; Wang, Cong; Thareja, Komal; Mandal, Anirban; Lyons, Eric; Zink, Michael; Papadimitriou, George; Deelman, Ewa (May 2023, IEEE INFOCOM 2023 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS))

Full Text Available
FlyNet: Drones on the Horizon

https://doi.org/10.1109/MIC.2023.3260440

Morel, Alicia Esquivel; Qu, Chengyi; Calyam, Prasad; Wang, Cong; Thareja, Komal; Mandal, Anirban; Lyons, Eric; Zink, Michael; Papadimitriou, George; Deelman, Ewa (May 2023, IEEE Internet Computing)

Full Text Available
Network Services Management using Programmable Data Planes for Visual Cloud Computing

https://doi.org/10.1109/ICNC57223.2023.10074183

Morel, Alicia Esquivel; Calyam, Prasad; Qu, Chengyi; Gafurov, Durbek; Wang, Cong; Thareja, Komal; Mandal, Anirban; Lyons, Eric; Zink, Michael; Papadimitriou, George; et al (February 2023, 2023 International Conference on Computing, Networking and Communications (ICNC))

Full Text Available
Automating Edge-to-cloud Workflows for Science: Traversing the Edge-to-cloud Continuum with Pegasus

https://doi.org/10.1109/CCGrid54584.2022.00098

Tanaka, Ryan; Papadimitriou, George; Viswanath, Sai Charan; Wang, Cong; Lyons, Eric; Thareja, Komal; Qu, Chengyi; Esquivel, Alicia; Deelman, Ewa; Mandal, Anirban; et al (May 2022, 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid))

In this paper, we describe how we extended the Pegasus Workflow Management System to support edge-to-cloud workflows in an automated fashion. We discuss how Pegasus and HTCondor (its job scheduler) work together to enable this automation. We use HTCondor to form heterogeneous pools of compute resources and Pegasus to plan the workflow onto these resources and manage containers and data movement for executing workflows in hybrid edge-cloud environments. We then show how Pegasus can be used to evaluate the execution of workflows running on edge only, cloud only, and edge-cloud hybrid environments. Using the Chameleon Cloud testbed to set up and configure an edge-cloud environment, we use Pegasus to benchmark the executions of one synthetic workflow and two production workflows: CASA-Wind and the Ocean Observatories Initiative Orcasound workflow, all of which derive their data from edge devices. We present the performance impact on workflow runs of job and data placement strategies employed by Pegasus when configured to run in the above three execution environments. Results show that the synthetic workflow performs best in an edge only environment, while the CASA - Wind and Orcasound workflows see significant improvements in overall makespan when run in a cloud only environment. The results demonstrate that Pegasus can be used to automate edge-to-cloud science workflows and the workflow provenance data collection capabilities of the Pegasus monitoring daemon enable computer scientists to conduct edge-to-cloud research.
more » « less
Full Text Available
Mining Workflows for Anomalous Data Transfers

https://doi.org/10.1109/MSR52588.2021.00013

Tu, Huy; Papadimitriou, George; Kiran, Mariam; Wang, Cong; Mandal, Anirban; Deelman, Ewa; Menzies, Tim (May 2021, MSR '21)

Modern scientific workflows are data-driven and are often executed on distributed, heterogeneous, high-performance computing infrastructures. Anomalies and failures in the work- flow execution cause loss of scientific productivity and inefficient use of the infrastructure. Hence, detecting, diagnosing, and mitigating these anomalies are immensely important for reliable and performant scientific workflows. Since these workflows rely heavily on high-performance network transfers that require strict QoS constraints, accurately detecting anomalous network perfor- mance is crucial to ensure reliable and efficient workflow execu- tion. To address this challenge, we have developed X-FLASH, a network anomaly detection tool for faulty TCP workflow transfers. X-FLASH incorporates novel hyperparameter tuning and data mining approaches for improving the performance of the machine learning algorithms to accurately classify the anoma- lous TCP packets. X-FLASH leverages XGBoost as an ensemble model and couples XGBoost with a sequential optimizer, FLASH, borrowed from search-based Software Engineering to learn the optimal model parameters. X-FLASH found configurations that outperformed the existing approach up to 28%, 29%, and 40% relatively for F-measure, G-score, and recall in less than 30 evaluations. From (1) large improvement and (2) simple tuning, we recommend future research to have additional tuning study as a new standard, at least in the area of scientific workflow anomaly detection.
more » « less
Full Text Available

« Prev Next »

Search for: All records